Comparative Study for Multi-Speaker Mongolian TTS with a New Corpus
نویسندگان
چکیده
Low-resource text-to-speech synthesis is a very promising research direction. Mongolian the official language of Inner Mongolia Autonomous Region and spoken by more than 10 million people worldwide. Mongolian, as representative low-resource language, has relative lack open-source datasets for its TTS. Therefore, we make public an multi-speaker TTS dataset, named MnTTS2, related researchers. In this work, invited three announcers to record topic-rich speeches. Each announcer recorded h speech, whole dataset was 30 in total. addition, built two baseline systems based on state-of-the-art neural architectures, including Fastspeech 2 model with HiFi-GAN vocoder full end-to-end VITS multi-speakers. On system FastSpeech2+HiFi-GAN, speakers scored 4.0 or higher both naturalness evaluation speaker similarity. achieved scores 4.5 similarity scores. The experimental results show that published MnTTS2 can be used build robust models.
منابع مشابه
Corpus building for Mongolian language
This paper presents an ongoing research aimed to build the first corpus, 5 million words, for Mongolian language by focusing on annotating and tagging corpus texts according to TEI XML (McQueen, 2004) format. Also, a tool, MCBuilder, which provides support for flexibly and manually annotating and manipulating the corpus texts with XML structure, is presented.
متن کاملa new type-ii fuzzy logic based controller for non-linear dynamical systems with application to 3-psp parallel robot
abstract type-ii fuzzy logic has shown its superiority over traditional fuzzy logic when dealing with uncertainty. type-ii fuzzy logic controllers are however newer and more promising approaches that have been recently applied to various fields due to their significant contribution especially when the noise (as an important instance of uncertainty) emerges. during the design of type- i fuz...
15 صفحه اولImproving TTS with Corpus-Specific Pronunciation Adaptation
Text-to-speech (TTS) systems are built on speech corpora which are labeled with carefully checked and segmented phonemes. However, phoneme sequences generated by automatic grapheme-to-phoneme converters during synthesis are usually inconsistent with those from the corpus, thus leading to poor quality synthetic speech signals. To solve this problem, the present work aims at adapting automaticall...
متن کاملCorpus-Based Unit Selection TTS for Hungarian
This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-tospeech (TTS) system for Hungarian. The experimental system generates weather forecasts in Hungarian. 5260 sentences were recorded creating a speech corpus containing 11 hours of continuous speech. A Hungarian speech recognizer was applied to label speech sound bound...
متن کاملPart of Speech Tagging for Mongolian Corpus
This paper introduces the current result of a research work which aims to build a 5 million tagged word corpus for Mongolian. Currently, around 1 million words have been automatically tagged by developing a POS tagset and a bigram POS tagger.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2023
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app13074237